Quantifying the Limits and Success of Extractive Summarization Systems Across Domains

نویسندگان

Hakan Ceylan

Rada Mihalcea

Umut O'zertem

Elena Lloret

Manuel Palomar

چکیده

This paper analyzes the topic identification stage of single-document automatic text summarization across four different domains, consisting of newswire, literary, scientific and legal documents. We present a study that explores the summary space of each domain via an exhaustive search strategy, and finds the probability density function (pdf) of the ROUGE score distributions for each domain. We then use this pdf to calculate the percentile rank of extractive summarization systems. Our results introduce a new way to judge the success of automatic summarization systems and bring quantified explanations to questions such as why it was so hard for the systems to date to have a statistically significant improvement over the lead baseline in the news domain.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Text Summarization Using Cuckoo Search Optimization Algorithm

Today, with rapid growth of the World Wide Web and creation of Internet sites and online text resources, text summarization issue is highly attended by various researchers. Extractive-based text summarization is an important summarization method which is included of selecting the top representative sentences from the input document. When, we are facing into large data volume documents, the extr...

متن کامل

Biogeography-Based Optimization Algorithm for Automatic Extractive Text Summarization

    Given the increasing number of documents, sites, online sources, and the users’ desire to quickly access information, automatic textual summarization has caught the attention of many researchers in this field. Researchers have presented different methods for text summarization as well as a useful summary of those texts including relevant document sentences. This study select...

متن کامل

Outlier Document Filtering Applied to the Extractive Summarization

Summarization requires selection of the more informative sentences within a set of documents. Generally, process assumes the document set includes related topics to a subject. However, some of the documents may be outlier and the effect of an outlier document might affect the success of extractive summary. Research is focused on filtering documents at the extraction stage these are outlier. Ext...

متن کامل

Who wrote What Where: Analyzing the content of human and automatic summaries

Abstractive summarization has been a longstanding and long-term goal in automatic summarization, because systems that can generate abstracts demonstrate a deeper understanding of language and the meaning of documents than systems that merely extract sentences from those documents. Genest (2009) showed that summaries from the top automatic summarizers are judged as comparable to manual extractiv...

متن کامل

A Subjective Logic Framework for Multi-Document Summarization

In this paper we propose SubSum, a subjective logic framework for sentence-based extractive multi-document summarization. Document summaries perceived by humans are subjective in nature as human judgements of sentence relevancy are inconsistent and laden with uncertainty. SubSum captures this uncertainty and extracts significant sentences from a document cluster to generate extractive summaries...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2010

Quantifying the Limits and Success of Extractive Summarization Systems Across Domains

نویسندگان

چکیده

منابع مشابه

Text Summarization Using Cuckoo Search Optimization Algorithm

Biogeography-Based Optimization Algorithm for Automatic Extractive Text Summarization

Outlier Document Filtering Applied to the Extractive Summarization

Who wrote What Where: Analyzing the content of human and automatic summaries

A Subjective Logic Framework for Multi-Document Summarization

عنوان ژورنال:

اشتراک گذاری